Back

G3: Genes, Genomes, Genetics

Oxford University Press (OUP)

Preprints posted in the last 90 days, ranked by how well they match G3: Genes, Genomes, Genetics's content profile, based on 222 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Letter to the Editor regarding "Long-read genome sequencing provides novel insights into the harmful algal bloom species Prymnesium parvum" by Jian et al. (2024)

Wisecaver, J.; Jeje, T.; Watervoort, N. F.

2026-01-23 genomics 10.64898/2026.01.21.699772 medRxiv
Top 0.1%
19.4%
Show abstract

Jian et al. (2024) describe de novo genome assemblies for two strains of Prymnesium parvum (sensu lato, s.l.), a cryptic species complex of toxic, unicellular algae responsible for harmful algal blooms around the world. Here, we present evidence that the labels for UTEX 2797 and CCMP 3037 were inadvertently swapped by Jian et al. (2024). This resulted in sequence data labeled "UTEX 2797" but derived from strain CCMP 3037, and vice versa. Strain misidentification is a major risk with cryptic species like P. parvum s.l., and our reanalysis of the data in Jian et al. (2024) underscores the urgent need for clade-specific markers to ensure accurate and efficient strain identification.

2
Genome report: Genome sequence of Phymata mystica (Evans), an ambush bug

Grebler, E. E. C.; Mongue, A. J.

2026-03-02 genomics 10.64898/2026.02.27.708606 medRxiv
Top 0.1%
10.0%
Show abstract

Recent advances in sequencing technology have made the sequencing of non-model organisms significantly more streamlined and feasible. Using these technologies, we begin to address the lack of data on non-model organisms, by sequencing the genome of one such species, Phymata mystica (Evans 1931), an ambush bug (Hemiptera: Heteroptera: Reduviidae: Phymatinae) specialized for floral sit-and-wait style predation. Our genome assembly is 710 Mb, in which 99.7% of this sequence is assembled into 14 chromosomal scaffolds. We found that repetitive elements accounted for 58.85% of the sequence. We report 26,760 protein-coding genes in a preliminary annotation of the genome. Using these new resources, we explored both macrosynteny and gene conservation. Starting with chromosome structure, we found that P. mystica has a single X chromosome, unlike other well-assembled Reduviids in which the X apparently split into two linkage groups. Exploring this new annotation, we found a number of venom proteins conserved between P. mystica and the other venomous Heteroptera with reference genomes, primarily serine proteases, metallopeptidase and heteropteran venom family proteins. These results provide a new framework for the evolution of venom in this group of insects and further demonstrate the ease with which non-model species can be studied using modern genomic methods.

3
Population genomics of Drosophila pseudoobscura

Manat, Y.; Zheng, Z.; Kritzell, C. A.; Gonzales, C. A.; Meisel, R. P.

2026-02-04 genomics 10.64898/2026.02.02.703370 medRxiv
Top 0.1%
9.8%
Show abstract

Drosophila pseudoobscura is a long-standing model organism in evolutionary genetics because natural populations segregate for an inversion polymorphism on the third chromosome. In addition, D. pseudoobscura has a neo-X chromosome that was formed by an X-autosome fusion, which segregates for a sex-ratio drive allele. Previous genome-wide studies of DNA sequence variation in D. pseudoobscura have focused on individual chromosomes or did not use chromosome-scale reference genomes. To address these shortcomings, we generated a new D. pseudoobscura population genetic resource by sequencing the genomes of over 60 inbred lines sampled across the species geographic range in North America. Using these data, we examined patterns of nucleotide diversity and population structure across the entire genome. Tajimas D was negative across most of the genome, consistent with a recent population expansion. However, there was substantial heterogeneity of D across chromosomes, suggesting distinct evolutionary dynamics across the genome. We found no strong evidence of population structure across most chromosomes, consistent with a near panmictic population. In contrast, we identified population structure on the third chromosome, which we attributed to the inversion polymorphism and used to infer the arrangements carried by the strains we sampled. Our analysis therefore demonstrates that tests for population structure can identify polymorphic chromosomal rearrangements. The population genomic data we have collected is publicly available and will support future research on genome evolution, local adaptation, and sex chromosome evolution in D. pseudoobscura.

4
Australian giant kelp genome assemblies show distinct Southern Hemisphere genetics

Scharfenstein, H. J.; Carroll, A.; Iha, C.; Schwoerbel, J.; Jordan, R.; Willis, A.

2026-02-21 genomics 10.64898/2026.02.20.707121 medRxiv
Top 0.1%
8.2%
Show abstract

Giant kelp, Macrocystis pyrifera, occurs across northern and southern Hemisphere temperate coasts and is at high risk from ocean warming. Few giant kelp forests remain across the Southeast Australian shelf, while a handful of forests are actively being restored. Genomic resources can greatly aid in the conservation of remnant populations and enhance restoration efforts. Reference genomes are a fundamental resource as they are a prerequisite to, or enhance, many analyses used in conservation genomics. A single reference genome is available for giant kelp, assembled from a Californian haploid specimen. However, increasing evidence of genetic divergence between Northern and Southern Hemisphere populations highlights the need for regionally representative reference genomes. Here, we present two genome assemblies from the diploid vegetative tissue of Australian giant kelp specimens. We performed de novo genome assembly using long-read sequencing (PacBio HiFi and ONT R10.4 Simplex) and scaffolded the assemblies with the ONT reads, assembling 98-99% of the genomes into 35 pseudo-chromosomes. Genome sizes ranged from 528-534 Mbp, with BUSCO completeness scores of 97-98% and QV scores of 51-52. Genome annotation identified 17,565-17,800 genes in the Australian genomes. Genomic divergence between the Australian and Californian giant kelp genomes was seven-fold greater than between Australian genomes (1.5% vs 0.2%), supporting a Northern-Southern Hemisphere genetic divergence. Functional divergence was also observed between Australian and Californian genomes, reflected by differing patterns of enrichment in gene ontologies linked to energy metabolism, proteostasis and stress responses. These two new genome assemblies will serve as valuable resources for ongoing research into Southern Hemisphere giant kelp genetics, while providing the basis for genomic-guided conservation and restoration of remnant giant kelp forests in Australia.

5
WITHDRAWN: A chromosome-level genome assembly of a vernal pool specialist amphibian, the Western Spadefoot, Spea hammondii

Thompsky, B.; Beraut, E.; Cooper, R. D.; Escalona, M.; Espinoza, R. E.; Fisher, R. N.; Miller, C.; Nguyen, O.; Sacco, S.; Sahasrabudhe, R.; Seligmann, W. E.; Tofflemier, E.; Wang, I. J.; Shaffer, H. B.

2026-02-04 genomics 10.1101/2025.11.16.688715 medRxiv
Top 0.1%
7.2%
Show abstract

We assembled and annotated a chromosome-level reference genome for the Western Spadefoot, Spea hammondii (Anura, Scaphiopodidae) representing one of only three amphibians included in the California Conservation Genomics Project (CCGP). Spea hammondii is a vernal pool breeding anuran native to California and northwestern Baja California which has undergone both range contractions and local extirpations across its distribution, primarily due to habitat loss and degradation and drought. The species is recognized by the state of California as a Species of Special Concern and is proposed for listing under the United States Endangered Species Act. Using the established CCGP pipeline, this S. hammondii genome was produced using Pacific Biosciences HiFi long-reads and Omni-C proximity ligation, resulting in a de novo genome assembly 1.14 Gb in length, distributed across 479 scaffolds (scaffold N50 = 120.8 Mb; largest scaffold = 183.6 Mb) with a BUSCO completeness score of 90.9% using a conserved tetrapod ortholog set. Our assembly shows high base accuracy (QV = 63.7) and low frameshift error in coding regions (QV 50.42). Annotation of this genome yielded 20,434 genes with a BUSCO completeness score of 94.7%. This reference genome, in combination with range-wide resequencing data from CCGP, will facilitate statewide population genomic assessments to delineate conservation units, quantify inbreeding and genomic load, and test for adaptive variation associated with vernal pool hydrology and drought tolerance, all of which are important considerations in the proposed federal listing.

6
A disjunct distribution and population fragmentation shape rangewide genetic diversity and structure of the endangered Physaria globosa (Brassicaceae)

Edwards, C. E.; Landon, C.; Bassuner, B.; Linan, A. G.; Albrecht, M. A.

2026-02-08 genetics 10.64898/2026.02.05.703860 medRxiv
Top 0.1%
7.1%
Show abstract

Population genetic analysis of species of conservation concern provides information to devise management plans to effectively conserve the genetic variation of endangered species. One such endangered plant, Physaria globosa is a federally endangered species in the mustard family with a geographically restricted range that occurs in four disjunct locations in Indiana, Kentucky, and Tennessee (i.e., Highland Rim and Nashville Basin regions) and along the Wabash, Kentucky, and Cumberland Rivers. In this study, we sampled populations from throughout the range of P. globosa, genotyped them using 20 microsatellite loci, and assessed genetic diversity and structure within and among populations. The goals of the study were to understand: 1) levels of genetic diversity in P. globosa and whether populations show evidence of having experienced reductions in genetic diversity as the result of genetic bottlenecks, genetic drift, or inbreeding, 2) rangewide genetic diversity and structure in P. globosa and how genetic structure is affected by the disjunctions in the species range, and 3) implications for prioritization of in-situ and ex-situ conservation efforts. On average, P. globosa showed comparable levels of genetic diversity to other species of Physaria. However, some populations showed evidence of inbreeding, genetic bottlenecks, or decreases in genetic diversity, possibly due to anthropogenic or climate-related pressures and decreases in population size due to competition with invasive bush honeysuckle. Genetic variation was strongly structured into two main geographic groups, one in the northern part of the species range (KY and IN), and the other in the southern part of the species range (TN), but some populations likely originated via long-distance dispersal. We also found significant isolation by distance, likely due to both life history characteristics and physical barriers associated with the complex topological structure of the landscape occupied by P. globosa, limiting population connectivity. Given the strong genetic structure found in P. globosa, several populations should be protected and managed within each geographic region to conserve genetic variation. Ex situ conservation will also be important to protect genetic diversity, particularly for populations that are difficult to access and manage.

7
SIN3 Regulates Transcriptional and Longevity Responses to Glycolytic Perturbation in Drosophila melanogaster

Amarasinghe, A. P.; Pile, L. A.

2026-03-11 genetics 10.64898/2026.03.09.710676 medRxiv
Top 0.1%
6.3%
Show abstract

Cellular metabolism and gene transcription are closely linked. The conserved transcriptional regulator SIN3 acts as a scaffold for histone deacetylase (HDAC)-containing complexes and is crucial for development, stress resistance, and overall organismal health. SIN3 regulates metabolic gene expression in Drosophila cultured cells, however, an understanding of the extent of its role in coordinating responses to metabolic stress in whole organisms is incomplete. In this study, we explored how SIN3 controls glycolytic gene expression across developmental stages and under genetic and dietary disruption of glycolysis in Drosophila melanogaster. Focusing on four key glycolytic enzymes: phosphofructokinase (Pfk), enolase (Eno), pyruvate kinase (Pyk), and pyruvate dehydrogenase beta (Pdhb), we found that reducing Sin3A levels increases their expression in both larvae and adults, indicating that SIN3 plays a consistent role in balancing metabolic gene transcription. Genetic interaction experiments indicate that Sin3A interacts with Pyk and Eno, regulating transcription in a gene-specific manner. Disrupting glycolysis via genetic or dietary means alters glycolytic gene expression, and SIN3 modulates this response. These findings indicate that SIN3 functions as a metabolic sensor, regulating transcription in response to cellular metabolic stress. Additionally, we demonstrate that reducing Sin3A levels shortens Drosophila lifespan on both low- and high-sucrose diets, emphasizing the importance of SIN3 in longevity. Overall, these results show that SIN3 is a context-dependent regulator of glycolytic gene expression and lifespan in Drosophila, integrating metabolic signals with chromatin-based transcriptional regulation. SummaryTo survive and thrive, organisms must adapt to distinct metabolic inputs. We investigated the response of the conserved transcriptional regulator SIN3 to metabolic stress and its control of glycolytic gene expression in Drosophila melanogaster. By measuring glycolytic gene expression, testing genetic interactions, and assessing lifespan under genetic and dietary perturbations, we found that Sin3A knockdown elevates glycolytic gene expression in a gene-specific manner and decreases longevity. SIN3 also modulates transcriptional responses to disrupted glycolysis and influences lifespan under sucrose stress. These findings identify SIN3 as a context-dependent transcription regulator that links gene expression with organismal metabolic adaptation.

8
Uncertainty-aware breeding decisions: MCMC-based optimum contribution selection increases breeding decision robustness

Ahlinder, J.; Waldmann, P.

2026-03-18 genetics 10.64898/2026.03.15.711440 medRxiv
Top 0.1%
6.2%
Show abstract

Current optimum contribution selection (OCS) implementations use point estimates of estimated breeding values (EBVs), potentially leading to suboptimal selections when individuals have uncertain genetic evaluations. We developed a framework assessing how EBV uncertainty affects OCS decisions through MCMC-based approaches using the COSMO optimizer in Julia, evaluated on Norway spruce (Picea abies, n=5,525) and Loblolly pine (Pinus taeda, n=926) populations. Agreement between point estimate (MAP-OCS) and MCMC-OCS was surprisingly low: mean overlap of only 26.6 (4.8) individuals in Norway spruce genotyped subpopulation and 14.1 (3.6) in full pedigree, with Loblolly pine intermediate at 16.0 (9.6). Despite this low individual-level agreement, selection frequency across MCMC iterations corresponded well with EBV rankings (Spearman{rho} = 0.782 for Norway spruce), confirming that higher-EBV individuals were preferentially selected under posterior uncertainty. To comprehensively quantify uncertainty impacts, we employed two complementary metrics: individual robustness scores measuring genetic gain stability upon candidate removal, and population-level contribution distribution metrics capturing concentration of genetic gain across selected individuals. Applying these metrics identified 25 high-risk individuals in Norway spruce and nine in Loblolly pine, and constrained exclusion of these individuals improved individual robustness by 16.5% in Loblolly pine (3.00% genetic gain loss) and 29.8% in Norway spruce (2.14% genetic gain loss). Our uncertainty-aware OCS framework successfully identifies unstable selections that may compromise long-term genetic gain, and we recommend assessing EBV uncertainty through posterior distributions and evaluating population-specific trade-offs when implementing uncertainty-aware selection strategies.

9
Chromosome-level genome assembly of the Erythrina Gall Wasp, Quadrastichus erythrinae (Hymenoptera: Eulophidae)

Zhang, Y. M.; Merondun, J.; Corpuz, R. L.; Kauwe, A. N.; Geib, S. M.; Sim, S. B.

2026-01-25 genomics 10.64898/2026.01.23.701414 medRxiv
Top 0.1%
4.9%
Show abstract

The erythrina gall wasp, Quadrastichus erythrinae, is an invasive gall-inducing chalcidoid wasp and a major pest of the endemic wiliwili tree (Erythrina sandwicensis) in Hawai{square}i. As a foundation to associated research, we generated a chromosome-level genome assembly from a wild-collected female measuring <2 mm. The final assembly consists of five scaffolds representing the five autosomes totaling 399 Mb (N50 = 75.6 Mb) and one unplaced 16 kb contig. BUSCO analysis recovers 89.8% of conserved Hymenoptera orthologs, representing the first chromosome-scale genome for the genus Quadrastichus. Comparative genomic analyses reveal conservation across Hymenoptera despite deep evolutionary divergence, with strongest collinearity to the chalcidoid Nasonia vitripennis. Genome size variation is largely explained by repeat content, and Q. erythrinae exhibits high proportions of unclassified transposable elements similar to cynipid gall inducer. We also assembled a complete genome of its endosymbiont, Wolbachia pipientis. Together, these genomic resources provide a foundation for comparative, evolutionary, and applied research aimed at managing this invasive pest.

10
High-efficiency targeted integration of extrachromosomal arrays in C. elegans using PhiC31 integrase

Rich, M. S.; Pellow, R.; Hefel, A.; Rog, O.; Jorgensen, E. M.

2026-01-21 genetics 10.1101/2025.11.11.687718 medRxiv
Top 0.1%
4.8%
Show abstract

Extrachromosomal arrays are unique chromosome-like structures created from DNA injected into the C. elegans germline. Arrays are easy to create and allow for high expression of multiple transgenes. They are, however, unstable unless integrated into a chromosome. Current methods for integration, such as X-rays and CRISPR, damage DNA and are low-efficiency. Here, we demonstrate that the viral integrase PhiC31, which mediates a non-mutagenic recombination between short attB and attP sequences, can be used for extremely efficient and targeted integration of arrays. In this method, a transgene, a selectable marker, and attP sites are injected into the gonad of a strain that (1) has an attB site in its genome, and (2) expresses PhiC31 in its germline. F1 extrachromosomal arrays are cloned, grown for multiple generations with selection, and then screened for homozygous array integrations. The procedure is simple, requires less time than screening for extrachromosomal arrays, and arrays can be screened for transgene function after stable integration. Arrays that transmit are integrated by PhiC31 with 50-95% efficiency, allowing for the isolation of many unique integrants from a single injection. Arrays can also be integrated at fluorescent landing pads and arbitrary sites in the genome. Using nanopore sequencing, we show that three new integrated arrays are between 1.6 and 18 megabases in length, assemble with large repeats, and can contain hundreds of copies of injected transgenes. We have built a collection of strains and plasmids to enable array integration at multiple sites in the genome using various selections. PhiC1-mediated Integration of Arrays of Transgenes (PhiAT) will allow C. elegans researchers to shift from using unstable extrachromosomal arrays to directly integrating arrays.

11
Telomeric amplicons of SUL1 and Y' in yeast are generated by microhomology-mediated break induced replication occurring in cis

Brewer, B. J.; Martin, R.; Ramage, E.; Payen, C.; Di Rienzi, S. C.; Zhao, Y.; Zane, K.; Verhey, J.; Galey, M.; Miller, D. E.; Ong, G. T.; McKee, J. L.; Alvino, G. M.; Dunham, M. J.; Raghuraman, M. K.

2026-04-09 genetics 10.64898/2026.04.07.716220 medRxiv
Top 0.1%
4.8%
Show abstract

Gene amplification is a potent driver of evolution and is thought to contribute to genetic diseases, including cancer. The yeast Saccharomyces cerevisiae is a powerful organism for understanding amplification mechanisms. When yeast is grown long term in sulfate-limiting chemostats, amplification of the gene that encodes the primary sulfate transporter, SUL1, is a common outcome. Here we describe a form of SUL1 amplification in which multiple copies of the right terminal region of chromosome II are appended in tandem to a native telomere. We find this form of amplicon when we delete the origin of replication next to SUL1 or delete a variety of genes involved in DNA metabolism. It is the only form of amplification found in a yku70{Delta} mutant suggesting that unprotected telomeres are involved. We propose that these terminal addition events occur when the unprotected 3 G1-3T telomeric sequence invades a short ([~]7 bp) internal telomere sequence (ITS) to begin a form of microhomology-mediated break-induced replication (mmBIR) that has been documented in type-I survivors of telomerase mutants. In addition to amplification of the right end of chromosome II we also find that telomeres containing the sub-telomeric repeat Y experience similar tandem amplification events and show that their formation is reduced in a pol32{Delta} mutant, a gene required for mmBIR. Within individual amplicons the ITSs and Ys are nearly identical, suggesting that the multiple copies of the amplified region are generated in a single mmBIR event that we describe as pseudo-rolling circle mmBIR. A similar amplification event at the P-telomere of human chromosome 18 has four copies of a [~]54 kb region separated by ITSs of nearly identical size. This finding suggests that these additional copies of the terminal fragment of human chromosome 18 arose by the same pseudo-rolling circle mechanism, perhaps during a period of telomeric stress. AUTHOR SUMMARYThe human genome is peppered with duplicates (or higher numbers) of segments that are located at sites both nearby and distant from the original, ancestral segments. These Copy Number Variants, or CNVs, appear to be highly variable among different individuals and are being examined with great interest as potential loci associated with genetic disease. Experimentally determining how these CNVs arise and become distributed across the genome is nearly impossible using humans. We are using budding yeast as the model organism to explore mechanisms of gene amplification. In this work we show that by destabilizing the ends of yeast chromosomes (telomeres) or by interfering with genes involved in the replication, repair, or recombination of DNA results in a specific form of segmental copy number increase that is initiated at telomeres. We propose that a telomere invades an internal chromosome site and sets up a pseudo-circular template for conservative DNA replication. The outcome is a chromosome with multiple, identical copies of a chromosome end arranged in tandem. We believe that it is also a major mechanism used by cells to repair telomeres that have become eroded during aging.

12
Allelic Association Analyses: Estimation Recommendations

Weir, B. S.; Goudet, J.

2026-01-30 genomics 10.64898/2026.01.26.701864 medRxiv
Top 0.1%
3.9%
Show abstract

We review the rich literature on the estimation of measures of inbreeding, relatedness and population structure, beginning with Sewall Wrights F-statistics and moving onto the descriptive statistics of Masatoshi Nei and Clark Cockerham. The current availability of genome-level single nucleotide variant data is allowing for sophisticated treatments of inferred identity by descent segments and inferred ancestral recombination graphs. Underlying such disparate methods is an emphasis of characterizing the descent status of alleles within and between individuals and populations and we have found allele-sharing statistics a convenient framework for examining the differences and similarities among different estimators. We have been able to resolve some long-standing reported differences among estimators, especially those involving the work of Nei. In the course of our algebraic and empirical treatment of descent measure estimation we have been able to formulate a set of five recommendations. Following the early work of Sewall Wright, we recommend 1. State that descent measures for pairs of alleles are relative to values in a reference set of allele pairs. With this view, we recommend 2. Use estimators that preserve descent measure rankings over different reference sets. Allele-sharing estimators satisfy this recommendation. Reducing genotypic data to allelic data has the benefit of reducing dimensionality, but we recommend 3. If genotypic data are available, avoid having to assume Hardy-Weinberg equilibrium by not reducing them to allelic data. Partly as a consequence of working with genotypic data, we recommend 4. Recognize that allele frequencies do not need to be estimated. Not estimating allele frequencies prevents the confounding of descent estimates for target pairs of alleles by the status of all pairs in a reference set. On the basis of both theoretical and empirical results, finally we recommend 5. Consider both inbreeding and kinship when estimating either one. It is difficult to envisage a natural population with relatedness but no inbreeding, or vice versa.

13
Introgression from the wild relative Manihot glaziovii on cassava (M. esculenta) chromosome 1 exhibits segregation distortion and no direct effect on dry matter

Villwock, S. S. C.; Rabbi, I. Y.; Ikpan, A. S.; Ogunpaimo, K.; Nafiu, K.; Kayondo, S. I.; Wolfe, M.; Jannink, J.-L.

2026-02-21 genetics 10.64898/2026.02.20.707074 medRxiv
Top 0.1%
3.9%
Show abstract

The cassava (Manihot esculenta) genome has two large introgressions from its wild relative M. glaziovii on chromosomes 1 and 4 that originate from historical hybridization efforts. The 10 Mbp chromosome 1 introgression has been increasing in frequency in African breeding populations due to its statistical association with higher dry matter content and root number. However, the region also exhibits suppressed recombination, hindering breeders ability to combine favorable glaziovii alleles with the cultivated esculenta background. Since homozygous introgressed lines are rarely selected for advanced trials, dominance effects have not been well-characterized. To analyze the effects of the introgression with higher resolution, we generated a population of over 5000 seedlings from crosses between heterozygous introgressed parents and screened for recombinants using ten KASP markers tagging glaziovii-specific alleles. An optimized subset of 453 lines was then selected and evaluated over two years for yield and vigor traits. Unlike previous studies, composite interval mapping and mixed linear models showed no significant associations between glaziovii alleles and dry matter content or root number. Small, opposing effects on clonal vigor were observed at different ends of the introgression. The region showed significant segregation distortion and enrichment of putative deleterious alleles. Genome alignment of M. esculenta and M. glaziovii assemblies did not show any major structural variants in the introgression region, suggesting that suppressed recombination is likely driven by sequence-level divergence rather than structural rearrangements. These results indicate that the glaziovii introgression does not directly contribute to dry matter, supporting the need for recombination and purging of the glaziovii introgression to aid cassava improvement. Plain language summaryA large chromosome segment from a wild relative of cassava is an important structural aspect in the cassava genome. Since the chromosome segment tends to be inherited as one block, its effects on cassava traits were not well resolved. Through genetic mapping at higher resolution, we identified that the wild segment impacts early vigor and does not appear to impact dry yield, as was previously thought. While there are no major structural differences between the wild and cultivated chromosome segments, their overall divergence seems to suppress the wild chromosome segment from pairing with the cultivated chromosome segment during reproduction. In the apparent absence of any major benefits from the wild segment, removing it from the breeding population may be beneficial. Core ideasO_LIA set of glaziovii allele-specific markers were designed to track the chromosome 1 introgression haplotype. C_LIO_LISegregation distortion suggests the presence of recessive deleterious or lethal alleles in the introgression. C_LIO_LIIncreased recombination is needed to purge deleterious alleles enriched in introgression region. C_LIO_LIThe glaziovii introgression was associated with slightly lower vigor rating and stem diameter. C_LIO_LIThe effects of the previously-identified glaziovii DM QTL were not detected in this population. C_LI

14
Climate cycles drive demographic history and genomic divergence in cactus wrens (Campylorhynchus brunneicapillus) across North American warm deserts

Rodriguez-Rojas, P. C.; Oceguera-Figueroa, A. F.; Navarro-Siguenza, A. G.; Vazquez Miranda, H.

2026-03-26 evolutionary biology 10.64898/2026.03.24.714001 medRxiv
Top 0.1%
3.7%
Show abstract

Text AbstractIn this study, we characterized the genetic structure and reconstructed the demographic history of cactus wrens (Campylorhynchus brunneicapillus), an endemic species of desert regions of North America, that shows a clear phenotypic and genotypic variation. We evaluated the effects of historical climate change on the structure and population dynamics of desert species using genomic data through genotyping by sequencing (GBS) and applied a population structure analysis (FST and ADMIXTURE), revealing two genetically differentiated groups: one continental and another peninsular in Baja California. Subsequently, we implemented the MSMC2 coalescent model on data divided into autosomal regions and the Z sex chromosome to estimate changes in effective population size (Ne) through evolutionary time. Additionally, we developed ecological niche models (ENMs) projected to the Last Glacial Maximum (LGM), Last Interglacial (LIG), Present times, and Future (2060 - 2080). Results indicate that both populations maintained moderated Nes before the LGM, experienced severe bottlenecks (Ne [~] 102-103), followed by a sustained expansion. However, recovery was limited to the Z chromosome of the peninsular population. These findings reveal how glaciations and interglacials shaped the evolutionary history of desert species and provide genomic evidence of the splitting of C. affinis from C. brunneicapillus. Article summaryThis research examines how climate changes shaped genetic diversity of cactus wrens across North American warm deserts. Using coalescent methods, researchers tracked effective population size changes over 100,000 years, using ecological niche modeling they predicted habitat suitability across climate periods. Results showed that continental and peninsular populations experienced bottlenecks during the Last Glacial Maximum, followed by demographic recovery on warm periods. However, the sex chromosome (Z) revealed male-biased demographic patterns in peninsular populations. Future projections indicated habitat suitability reductions for peninsular populations, highlighting conservation concerns. These findings demonstrate that past climate shaped genetic diversity of cactus wrens.

15
Loss-of-function phenomics, ncORFs, and ambiguity of mutant phenotypes in Medicago truncatula

Cakir, U.; Gabed, N.; Kaya, S.; Benedito, V. A.; Brunet, M. A.; Roucou, X.; Kryvoruchko, I. S.

2026-03-10 genetics 10.64898/2026.03.07.710271 medRxiv
Top 0.1%
3.6%
Show abstract

Non-canonical open reading frames (ncORFs) are an emerging area of research that is quickly gaining momentum. Many peptides and proteins missed in initial annotation efforts (ncProts) were subsequently shown to be crucial for a wide range of biological processes. The discovery of ncORFs continues to improve the accuracy of loss-of-function studies because they often occupy the same genomic spaces as annotated ORFs. While databases of mutant phenotypes linked to genomic loci are available in a few species, none of these databases integrate the information on ncORFs present in already characterized loci. In this study, we introduce a nearly comprehensive loss-of-function phenomics dataset of Medicago truncatula (673 loci characterized over the past 30 years), which should become an integral part of the genome browser of this organism. We used this dataset to provide a critical analysis of the potential contribution of ncORFs to published phenotypes. We detected mass spectrometry (MS)-validated ncORFs in 10 functionally characterized genes, including major regulators of development and symbiotic relationships. We also found conserved ncORFs in 113 characterized genes, including four genes with highly conserved ncORFs. We show that in some studies, the contribution of ncORFs can be ruled out, while in others it cannot. Using real examples, we systematized ambiguities associated with ncORFs. Furthermore, we highlighted little-known trans effects of insertional mutagenesis on splicing as contributors to that ambiguity. Finally, our meta-analysis of published phenotypes indicates that different protein classes have significantly different (unique) proportions of unconditional, conditional, and neutral phenotypes, potentially reflecting their relative functional importance. Significance statementThis study is the first to merge a nearly comprehensive inventory of loss-of-function studies in a eukaryotic organism with the information on novel MS-validated and conserved ncORFs.

16
A pilot study for whole proteome tagging in C. elegans

Eroglu, M.; Hobert, O.

2026-02-10 genetics 10.64898/2026.02.09.704846 medRxiv
Top 0.2%
3.6%
Show abstract

Tagging all proteins encoded by an animal genome with a fluorescent tag would open many windows to the discovery of unexpected patterns of protein expression and localization. To scale such an approach, it would be beneficial to introduce multiple, spectrally distinct fluorophore tags in parallel. As a first step in this direction, we undertook a pilot study in the nematode C. elegans, in which we set out to tag 30 different genetic loci with three different fluorophores, with 3 tags being introduced at a time. By choosing essential genes, predicted based on transcriptomics to cover a range of expression levels, we explore issues relating to disrupting gene function and visibility of tagged proteins. We demonstrate that such a tagging approach is highly efficient and indeed reveals unanticipated patterns of cellular sites of expression, as well as subcellular protein localization. We hope that this pilot study will motivate attempts to scale this tagging approach to more loci and, ultimately, the whole genome.

17
Validation of a Single Nepl15 Transcript in Oregon-R Drosophila melanogaster with Minor Coding Sequence Variation Relative to FlyBase

Drucker, C.; Banerjee, S.

2026-02-21 genomics 10.64898/2026.02.20.707127 medRxiv
Top 0.2%
3.5%
Show abstract

The Neprilysin-like 15 (Nepl15) transcript in Drosophila melanogaster displays sex and organ specific phenotypes and only has one known transcript in the fly database (FlyBase.org). Given that the Nepl15 gene is differentially expressed in a tissue-specific and sex-specific manner, we sought to identify if there were additional Nepl15 transcripts available in Oregon-R strain flies which had not been reported in the fly database by performing sequencing-based approaches. We have identified presence of different codons in the transcript different than what had been reported in FlyBase. Further experimentation is needed to determine the full effect of these changes on fly physiology.

18
Transposons contribute to splice-isoform diversity in the Drosophila brain

Choucri, M.; Treiber, C. D.

2026-01-26 genomics 10.64898/2026.01.22.701052 medRxiv
Top 0.2%
3.5%
Show abstract

The extraordinary complexity of the brain depends in part on the vast diversity of mRNA isoforms it expresses, often in a cell-type specific manner. In a recent study, we found that intronic transposable elements (TEs) are spliced into neural transcripts and diversify the splice isoform repertoire of neurons and glia (Treiber and Waddell, 2020). A recent paper by Azad et al. revisits these findings using their TIDAL analysis pipeline applied to our published data (Azad et al., 2024). Their analysis did not find any of the splicing reads we reported, and although they used RT-PCR to test seven of the 264 TE-gene pairs we had previously reported, they failed to validate TE-gene splicing in any of them. Here, we conduct a quantitative analysis of TE exonisation and show that intronic TE insertions are frequently recruited as alternative exons, with exon usage ranging from rare events to near-complete inclusion in transcripts. We implement this analysis in an improved version of our TEChim software, and present clear support for TE-gene splicing at the seven loci tested by Azad et al. We also identify methodological issues in the experimental and computational design of the Azad et al. study that likely explain their failure to detect TE-gene chimeras, while demonstrating that TE-gene splicing can be detected by RT-PCR under appropriate experimental conditions. Together, our data demonstrates that TE splice isoforms are not rare artefacts but measurable and biologically relevant features of the Drosophila brain transcriptome that may contribute to the molecular complexity and functional adaptability of the brain.

19
Genomic selection validated across two generations of loblolly pine breeding

Isik, F.; Cooperative Tree Improvement Program, ; Shalizi, M. N.; Walker, T. D.

2026-01-24 genetics 10.64898/2026.01.22.701135 medRxiv
Top 0.2%
3.5%
Show abstract

This study evaluated the effectiveness of genomic selection (GS) in loblolly pine (Pinus taeda) using a two-generation closed breeding population and a genetically diverse Mainline population. Single-step genomic best linear unbiased prediction (ssGBLUP) models were used to include all phenotypic, genotypic, and pedigree information. Prediction accuracies of genomic estimated breeding values reached up to 0.70 for stem volume and stem straightness. Prediction accuracy showed a strong linear relationship with mean relatedness between training and validation populations (r > 0.92). Adjusting the scaling between genomic and pedigree relationship matrices improved model stability, increased prediction accuracy, and reduced bias in genomic estimated breeding values. Estimates of heritability and variance components from ssGBLUP were consistent with pedigree-based models, particularly when genomic relationships were properly scaled. Genomic selection had approximately 50% more genetic gain per year relative to conventional selection. Overall, these results demonstrate that GS can be effectively integrated into operational conifer breeding programs, given sustained investment in large, well-connected training populations with high-quality phenotypic data. We also outline the planned implementation of GS in the North Carolina State University Cooperative Tree Improvement Program to increase genetic gain.

20
Using Variable Window Sizes for Phylogenomic Analyses of Whole Genome Alignments

Ivan, J.; Lanfear, R.

2026-03-06 bioinformatics 10.64898/2026.03.04.709403 medRxiv
Top 0.2%
3.5%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWMany phylogenomic studies used non-overlapping windows to address gene tree discordance across a set of aligned genomes. Recently, Ivan et al. (2025) proposed an information theoretic approach to choose an optimal window size given the alignment. However, this approach selects only a single fixed window size per chromosome, which is a useful first step but fails to account for variation in the size of non-recombining regions along each chromosome. Such variation is expected to occur due to the stochastic nature of recombination as well as the variation in recombination rates along chromosomes. In this study, we extend the approach of Ivan et al. (2025) to allow window sizes to vary across the chromosome, using a splitting-and-merging strategy that allows for each window to be of an arbitrary length. We showed that the new method outperformed the fixed-window approach in recovering gene tree topologies on a wide range of simulated datasets. Applying the new method on the genomes of seven Heliconius butterflies, we found that the average window sizes for the group ranged between 538-808bp, but with a very similar distribution of gene tree topologies compared to previous studies that used fixed window sizes. For the genomes of great apes, the average window sizes ranged from 4.2kb to 6.2kb, with the proportion of the major topology (i.e., grouping human and chimpanzee together) reaching approximately 80%. In conclusion, our study highlights the limitations of using a fixed window size when recombination rates vary across the chromosomes, and proposes a splitting-and-merging approach that allows for variable window sizes across whole genome alignments.